Text Reduction-Enrichment at WebCLEF
نویسندگان
چکیده
In this paper we are reporting the results obtained after submitting one run to the Mixed Monolingual task of WebCLEF 2006. We have used a text reduction process based on the selection of mid-frequency terms. Although our approach enhances precision, it must be improved in recall by an enrichment process based on the addition of high co-ocurrence terms. We have seen that a improvement of 40% in the corpus used last year in the BiEnEs was obtained. But we also observed that low Mean Reciprocal Rank (MRR) values were obtained compared with those of the mixed monolingual task of WebCLEF 2005. We consider that our low MRR is derived of a bad preprocessing phase, but we must investigate this issue in detail.
منابع مشابه
Vocabulary Reduction and Text Enrichment at WebCLEF
Nowadays, cross-lingual Information Retrieval (IR) is one of the greatest challenges to deal with. Besides, one of the most important issues in IR consists in the corpus vocabulary reduction in order to make possible to use in real situations some methods of IR such as the well-known vector space model. In this work, we have considered a vocabulary reduction process based on the selection of mi...
متن کاملBUAP-UPV TPIRS: A System for Document Indexing Reduction at WebCLEF
In this paper we present the results of BUAP/UPV universities in WebCLEF, a particular task of CLEF 2005. Particularly, we evaluate our information retrieval system at the bilingual “English to Spanish” task. Our system uses a term reduction process based on the Transition Point technique. Our results show that it is possible to reduce the number of terms to index, thereby improving the perform...
متن کاملTPIRS: A System for Document Indexing Reduction on WebCLEF
In this paper we present the results of BUAP/UPV universities in WebCLEF, a particular task of CLEF 2005. Particularly, we evaluate our information retrieval system in the bilingual English to Spanish track. Our system uses a term reduction process based on the Transition Point technique. Our results show that it is possible to reduce the number of terms to index, thereby improving the performa...
متن کاملMultilingual Web Retrieval Experiments with Field Specific Indexing Strategies for CLEF 2006 at the University of Hildesheim
For WebCLEF 2006 we experimented with the analysis and extraction of the HTML structure of the web documents. In addition, blind relevance feedback was applied in the search process. As in 2005, the experiments were carried out with a language independent indexing strategy. We experimented with HTML title, H1 element and other elements emphasizing text. Our index contained title and H1, emphasi...
متن کاملThe Impact of Input Enrichment in Long Text vs. Short Texts on Grammatical Accuracy in Writing Among Elementary Language Learners
This study was conducted to investigate the influence of teaching accurate grammar inwriting via enriched long text and short text for the elementary students atShokouhe_Farhang institute. The homogenized subjects were divided into two groups of 18and 17 participants. Using a writing exam as a pretest in order to check the students’knowledge in English past tense. The control group received the...
متن کامل